Main Datasets (w/ hospitalised data)

Source: https://covidtracking.com/ Source: https://github.com/CSSEGISandData/COVID-19 Various state data, third party data, and various federal data

Combine, validate, and verify data sets.

# see what filtered main dataframe looks like for all 50 states: 
all_cases.head(50)
date state positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently recovered dataQualityGrade ... totalTestsViral positiveTestsViral negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade
0 2020-06-29 AK 904.0 365.0 16.0 NaN NaN 1.0 525.0 A ... 108709.0 NaN NaN NaN 0 0 0 0 0 NaN
1 2020-06-29 AL 37175.0 17380.0 715.0 2725.0 NaN NaN 18866.0 B ... NaN NaN NaN 36682.0 0 0 0 0 0 NaN
2 2020-06-29 AR 20257.0 5926.0 300.0 1380.0 NaN 63.0 14066.0 A ... NaN NaN NaN 20257.0 0 0 0 0 0 NaN
4 2020-06-29 AZ 74533.0 63766.0 2721.0 4634.0 679.0 465.0 9179.0 A+ ... 511009.0 NaN NaN 74119.0 0 0 0 0 0 NaN
5 2020-06-29 CA 216550.0 NaN 6179.0 NaN 1686.0 NaN NaN B ... 4061692.0 NaN NaN 216550.0 0 0 0 0 0 NaN
6 2020-06-29 CO 32307.0 26172.0 234.0 5401.0 NaN NaN 4459.0 A ... NaN NaN NaN 29476.0 0 0 0 0 0 NaN
7 2020-06-29 CT 46362.0 33989.0 99.0 10268.0 NaN NaN 8053.0 B ... 442998.0 NaN NaN 44384.0 0 0 0 0 0 NaN
8 2020-06-29 DC 10292.0 8541.0 121.0 NaN 34.0 23.0 1200.0 A+ ... NaN NaN NaN 10292.0 0 0 0 0 0 NaN
9 2020-06-29 DE 11376.0 4204.0 72.0 NaN 15.0 NaN 6665.0 A+ ... NaN NaN NaN 10306.0 0 0 0 0 0 NaN
10 2020-06-29 FL 146341.0 NaN NaN 14651.0 NaN NaN NaN A ... 2299430.0 189183.0 2106109.0 146341.0 0 0 0 0 0 NaN
11 2020-06-29 GA 79417.0 NaN 1359.0 10824.0 NaN NaN NaN A ... 823117.0 73044.0 750073.0 79417.0 0 0 0 0 0 NaN
13 2020-06-29 HI 899.0 162.0 NaN 111.0 NaN NaN 719.0 D ... 89866.0 899.0 88967.0 899.0 0 0 0 0 0 NaN
14 2020-06-29 IA 28782.0 10223.0 119.0 NaN 35.0 18.0 17851.0 A+ ... NaN NaN NaN 28782.0 0 0 0 0 0 NaN
15 2020-06-29 ID 5319.0 1330.0 NaN 312.0 NaN NaN 3898.0 A ... 85816.0 NaN NaN 4790.0 0 0 0 0 0 NaN
16 2020-06-29 IL 143514.0 NaN 1501.0 NaN 372.0 187.0 NaN A ... 1571896.0 NaN NaN 142461.0 0 0 0 0 0 NaN
17 2020-06-29 IN 45228.0 8256.0 626.0 7024.0 265.0 85.0 34348.0 A+ ... NaN NaN NaN 45228.0 0 0 0 0 0 NaN
18 2020-06-29 KS 14443.0 13379.0 NaN 1152.0 NaN NaN 794.0 A ... NaN NaN NaN 14443.0 0 0 0 0 0 NaN
19 2020-06-29 KY 15347.0 10848.0 381.0 2602.0 69.0 NaN 3939.0 B ... NaN NaN NaN 14835.0 0 0 0 0 0 NaN
20 2020-06-29 LA 57081.0 11657.0 737.0 NaN NaN 79.0 42225.0 B ... NaN NaN NaN 57081.0 0 0 0 0 0 NaN
21 2020-06-29 MA 108768.0 NaN 762.0 11345.0 138.0 79.0 NaN A+ ... 1057932.0 NaN NaN 103628.0 0 0 0 0 0 NaN
22 2020-06-29 MD 67254.0 59100.0 447.0 10822.0 160.0 NaN 4979.0 A ... 644026.0 NaN NaN 67254.0 0 0 0 0 0 NaN
23 2020-06-29 ME 3219.0 491.0 31.0 347.0 8.0 4.0 2623.0 A ... 93142.0 3884.0 89123.0 2863.0 0 0 0 0 0 NaN
24 2020-06-29 MI 70223.0 12963.0 557.0 NaN 193.0 106.0 51099.0 A+ ... 1046545.0 87393.0 959152.0 63497.0 0 0 0 0 0 NaN
25 2020-06-29 MN 35861.0 3166.0 278.0 4031.0 140.0 NaN 31225.0 A ... 592955.0 NaN NaN 35861.0 0 0 0 0 0 NaN
26 2020-06-29 MO 21043.0 NaN 599.0 NaN NaN NaN NaN B ... 424214.0 23527.0 399926.0 21043.0 0 0 0 0 0 NaN
28 2020-06-29 MS 26567.0 6120.0 719.0 3115.0 158.0 93.0 19388.0 A ... 283355.0 NaN NaN 26400.0 0 0 0 0 0 NaN
29 2020-06-29 MT 919.0 288.0 13.0 100.0 NaN NaN 609.0 C ... NaN NaN NaN 919.0 0 0 0 0 0 NaN
30 2020-06-29 NC 63484.0 16621.0 843.0 NaN NaN NaN 45538.0 A ... NaN NaN NaN 63484.0 0 0 0 0 0 NaN
31 2020-06-29 ND 3539.0 288.0 25.0 227.0 NaN NaN 3163.0 D ... 180558.0 NaN NaN 3539.0 0 0 0 0 0 NaN
32 2020-06-29 NE 18899.0 5310.0 117.0 1316.0 NaN NaN 13322.0 B ... NaN NaN NaN 18899.0 0 0 0 0 0 NaN
33 2020-06-29 NH 5760.0 958.0 34.0 565.0 NaN NaN 4435.0 B ... NaN NaN NaN 5760.0 0 0 0 0 0 NaN
34 2020-06-29 NJ 171272.0 126117.0 978.0 19847.0 225.0 185.0 30163.0 A+ ... NaN NaN NaN 171272.0 0 0 0 0 0 NaN
35 2020-06-29 NM 11809.0 6053.0 114.0 1865.0 NaN NaN 5264.0 B ... NaN NaN NaN 11809.0 0 0 0 0 0 NaN
36 2020-06-29 NV 17894.0 16701.0 555.0 NaN 124.0 65.0 689.0 A+ ... 314388.0 NaN NaN 17894.0 0 0 0 0 0 NaN
37 2020-06-29 NY 392930.0 297653.0 853.0 89995.0 216.0 136.0 70435.0 A ... NaN NaN NaN 392930.0 0 0 0 0 0 NaN
38 2020-06-29 OH 51046.0 NaN 669.0 7746.0 245.0 115.0 NaN B ... NaN NaN NaN 47524.0 0 0 0 0 0 NaN
39 2020-06-29 OK 13172.0 3200.0 329.0 1489.0 134.0 NaN 9587.0 A+ ... 327683.0 13941.0 313021.0 13172.0 0 0 0 0 0 NaN
40 2020-06-29 OR 8485.0 5581.0 151.0 1025.0 44.0 25.0 2700.0 A+ ... NaN NaN 226648.0 8121.0 0 0 0 0 0 NaN
41 2020-06-29 PA 85988.0 12304.0 635.0 NaN NaN 111.0 67070.0 A+ ... NaN NaN NaN 83529.0 0 0 0 0 0 NaN
43 2020-06-29 RI 16764.0 14191.0 73.0 1995.0 15.0 14.0 1627.0 A+ ... NaN NaN NaN 16764.0 0 0 0 0 0 NaN
44 2020-06-29 SC 34644.0 20468.0 1032.0 2622.0 NaN NaN 13456.0 A ... 369207.0 44241.0 324966.0 34546.0 0 0 0 0 0 NaN
45 2020-06-29 SD 6716.0 807.0 70.0 657.0 NaN NaN 5818.0 B ... NaN NaN NaN 6716.0 0 0 0 0 0 NaN
46 2020-06-29 TN 42297.0 14743.0 512.0 2599.0 NaN NaN 26962.0 B ... 776858.0 48873.0 727985.0 41949.0 0 0 0 0 0 NaN
47 2020-06-29 TX 153011.0 69273.0 5913.0 NaN NaN NaN 81335.0 B ... 1819189.0 NaN NaN NaN 0 0 0 0 0 NaN
48 2020-06-29 UT 21664.0 9291.0 254.0 1417.0 80.0 NaN 12205.0 A+ ... NaN NaN NaN 21664.0 0 0 0 0 0 NaN
49 2020-06-29 VA 62189.0 52426.0 796.0 8823.0 225.0 101.0 8023.0 A+ ... 633705.0 NaN NaN 59522.0 0 0 0 0 0 NaN
51 2020-06-29 VT 1208.0 203.0 13.0 NaN NaN NaN 949.0 B ... NaN NaN NaN 1208.0 0 0 0 0 0 NaN
52 2020-06-29 WA 31752.0 NaN 270.0 4275.0 NaN 52.0 NaN B ... NaN NaN NaN 31752.0 0 0 0 0 0 NaN
53 2020-06-29 WI 31033.0 8032.0 237.0 3407.0 90.0 NaN 22217.0 A+ ... NaN NaN NaN 28058.0 0 0 0 0 0 NaN
54 2020-06-29 WV 2870.0 581.0 28.0 NaN 5.0 3.0 2196.0 B ... NaN NaN NaN 2771.0 0 0 0 0 0 NaN

50 rows × 25 columns

#Add state level data, beds, beds/1k, population, abbreviation, and name:
all_cases.head(50)
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
0 2020-06-29 Alaska AK 734002 904.0 365.0 16.0 NaN NaN 1.0 ... NaN NaN 0 0 0 0 0 NaN 2.2 1614.8044
1 2020-06-29 Alabama AL 4908621 37175.0 17380.0 715.0 2725.0 NaN NaN ... NaN 36682.0 0 0 0 0 0 NaN 3.1 15216.7251
2 2020-06-29 Arkansas AR 3038999 20257.0 5926.0 300.0 1380.0 NaN 63.0 ... NaN 20257.0 0 0 0 0 0 NaN 3.2 9724.7968
3 2020-06-29 Arizona AZ 7378494 74533.0 63766.0 2721.0 4634.0 679.0 465.0 ... NaN 74119.0 0 0 0 0 0 NaN 1.9 14019.1386
4 2020-06-29 California CA 39937489 216550.0 NaN 6179.0 NaN 1686.0 NaN ... NaN 216550.0 0 0 0 0 0 NaN 1.8 71887.4802
5 2020-06-29 Colorado CO 5845526 32307.0 26172.0 234.0 5401.0 NaN NaN ... NaN 29476.0 0 0 0 0 0 NaN 1.9 11106.4994
6 2020-06-29 Connecticut CT 3563077 46362.0 33989.0 99.0 10268.0 NaN NaN ... NaN 44384.0 0 0 0 0 0 NaN 2.0 7126.1540
7 2020-06-29 District of Columbia DC 720687 10292.0 8541.0 121.0 NaN 34.0 23.0 ... NaN 10292.0 0 0 0 0 0 NaN 4.4 3171.0228
8 2020-06-29 Delaware DE 982895 11376.0 4204.0 72.0 NaN 15.0 NaN ... NaN 10306.0 0 0 0 0 0 NaN 2.2 2162.3690
9 2020-06-29 Florida FL 21992985 146341.0 NaN NaN 14651.0 NaN NaN ... 2106109.0 146341.0 0 0 0 0 0 NaN 2.6 57181.7610
10 2020-06-29 Georgia GA 10736059 79417.0 NaN 1359.0 10824.0 NaN NaN ... 750073.0 79417.0 0 0 0 0 0 NaN 2.4 25766.5416
11 2020-06-29 Hawaii HI 1412687 899.0 162.0 NaN 111.0 NaN NaN ... 88967.0 899.0 0 0 0 0 0 NaN 1.9 2684.1053
12 2020-06-29 Iowa IA 3179849 28782.0 10223.0 119.0 NaN 35.0 18.0 ... NaN 28782.0 0 0 0 0 0 NaN 3.0 9539.5470
13 2020-06-29 Idaho ID 1826156 5319.0 1330.0 NaN 312.0 NaN NaN ... NaN 4790.0 0 0 0 0 0 NaN 1.9 3469.6964
14 2020-06-29 Illinois IL 12659682 143514.0 NaN 1501.0 NaN 372.0 187.0 ... NaN 142461.0 0 0 0 0 0 NaN 2.5 31649.2050
15 2020-06-29 Indiana IN 6745354 45228.0 8256.0 626.0 7024.0 265.0 85.0 ... NaN 45228.0 0 0 0 0 0 NaN 2.7 18212.4558
16 2020-06-29 Kansas KS 2910357 14443.0 13379.0 NaN 1152.0 NaN NaN ... NaN 14443.0 0 0 0 0 0 NaN 3.3 9604.1781
17 2020-06-29 Kentucky KY 4499692 15347.0 10848.0 381.0 2602.0 69.0 NaN ... NaN 14835.0 0 0 0 0 0 NaN 3.2 14399.0144
18 2020-06-29 Louisiana LA 4645184 57081.0 11657.0 737.0 NaN NaN 79.0 ... NaN 57081.0 0 0 0 0 0 NaN 3.3 15329.1072
19 2020-06-29 Massachusetts MA 6976597 108768.0 NaN 762.0 11345.0 138.0 79.0 ... NaN 103628.0 0 0 0 0 0 NaN 2.3 16046.1731
20 2020-06-29 Maryland MD 6083116 67254.0 59100.0 447.0 10822.0 160.0 NaN ... NaN 67254.0 0 0 0 0 0 NaN 1.9 11557.9204
21 2020-06-29 Maine ME 1345790 3219.0 491.0 31.0 347.0 8.0 4.0 ... 89123.0 2863.0 0 0 0 0 0 NaN 2.5 3364.4750
22 2020-06-29 Michigan MI 10045029 70223.0 12963.0 557.0 NaN 193.0 106.0 ... 959152.0 63497.0 0 0 0 0 0 NaN 2.5 25112.5725
23 2020-06-29 Minnesota MN 5700671 35861.0 3166.0 278.0 4031.0 140.0 NaN ... NaN 35861.0 0 0 0 0 0 NaN 2.5 14251.6775
24 2020-06-29 Missouri MO 6169270 21043.0 NaN 599.0 NaN NaN NaN ... 399926.0 21043.0 0 0 0 0 0 NaN 3.1 19124.7370
25 2020-06-29 Mississippi MS 2989260 26567.0 6120.0 719.0 3115.0 158.0 93.0 ... NaN 26400.0 0 0 0 0 0 NaN 4.0 11957.0400
26 2020-06-29 Montana MT 1086759 919.0 288.0 13.0 100.0 NaN NaN ... NaN 919.0 0 0 0 0 0 NaN 3.3 3586.3047
27 2020-06-29 North Carolina NC 10611862 63484.0 16621.0 843.0 NaN NaN NaN ... NaN 63484.0 0 0 0 0 0 NaN 2.1 22284.9102
28 2020-06-29 North Dakota ND 761723 3539.0 288.0 25.0 227.0 NaN NaN ... NaN 3539.0 0 0 0 0 0 NaN 4.3 3275.4089
29 2020-06-29 Nebraska NE 1952570 18899.0 5310.0 117.0 1316.0 NaN NaN ... NaN 18899.0 0 0 0 0 0 NaN 3.6 7029.2520
30 2020-06-29 New Hampshire NH 1371246 5760.0 958.0 34.0 565.0 NaN NaN ... NaN 5760.0 0 0 0 0 0 NaN 2.1 2879.6166
31 2020-06-29 New Jersey NJ 8936574 171272.0 126117.0 978.0 19847.0 225.0 185.0 ... NaN 171272.0 0 0 0 0 0 NaN 2.4 21447.7776
32 2020-06-29 New Mexico NM 2096640 11809.0 6053.0 114.0 1865.0 NaN NaN ... NaN 11809.0 0 0 0 0 0 NaN 1.8 3773.9520
33 2020-06-29 Nevada NV 3139658 17894.0 16701.0 555.0 NaN 124.0 65.0 ... NaN 17894.0 0 0 0 0 0 NaN 2.1 6593.2818
34 2020-06-29 New York NY 19440469 392930.0 297653.0 853.0 89995.0 216.0 136.0 ... NaN 392930.0 0 0 0 0 0 NaN 2.7 52489.2663
35 2020-06-29 Ohio OH 11747694 51046.0 NaN 669.0 7746.0 245.0 115.0 ... NaN 47524.0 0 0 0 0 0 NaN 2.8 32893.5432
36 2020-06-29 Oklahoma OK 3954821 13172.0 3200.0 329.0 1489.0 134.0 NaN ... 313021.0 13172.0 0 0 0 0 0 NaN 2.8 11073.4988
37 2020-06-29 Oregon OR 4301089 8485.0 5581.0 151.0 1025.0 44.0 25.0 ... 226648.0 8121.0 0 0 0 0 0 NaN 1.6 6881.7424
38 2020-06-29 Pennsylvania PA 12820878 85988.0 12304.0 635.0 NaN NaN 111.0 ... NaN 83529.0 0 0 0 0 0 NaN 2.9 37180.5462
39 2020-06-29 Rhode Island RI 1056161 16764.0 14191.0 73.0 1995.0 15.0 14.0 ... NaN 16764.0 0 0 0 0 0 NaN 2.1 2217.9381
40 2020-06-29 South Carolina SC 5210095 34644.0 20468.0 1032.0 2622.0 NaN NaN ... 324966.0 34546.0 0 0 0 0 0 NaN 2.4 12504.2280
41 2020-06-29 South Dakota SD 903027 6716.0 807.0 70.0 657.0 NaN NaN ... NaN 6716.0 0 0 0 0 0 NaN 4.8 4334.5296
42 2020-06-29 Tennessee TN 6897576 42297.0 14743.0 512.0 2599.0 NaN NaN ... 727985.0 41949.0 0 0 0 0 0 NaN 2.9 20002.9704
43 2020-06-29 Texas TX 29472295 153011.0 69273.0 5913.0 NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 2.3 67786.2785
44 2020-06-29 Utah UT 3282115 21664.0 9291.0 254.0 1417.0 80.0 NaN ... NaN 21664.0 0 0 0 0 0 NaN 1.8 5907.8070
45 2020-06-29 Virginia VA 8626207 62189.0 52426.0 796.0 8823.0 225.0 101.0 ... NaN 59522.0 0 0 0 0 0 NaN 2.1 18115.0347
46 2020-06-29 Vermont VT 628061 1208.0 203.0 13.0 NaN NaN NaN ... NaN 1208.0 0 0 0 0 0 NaN 2.1 1318.9281
47 2020-06-29 Washington WA 7797095 31752.0 NaN 270.0 4275.0 NaN 52.0 ... NaN 31752.0 0 0 0 0 0 NaN 1.7 13255.0615
48 2020-06-29 Wisconsin WI 5851754 31033.0 8032.0 237.0 3407.0 90.0 NaN ... NaN 28058.0 0 0 0 0 0 NaN 2.1 12288.6834
49 2020-06-29 West Virginia WV 1778070 2870.0 581.0 28.0 NaN 5.0 3.0 ... NaN 2771.0 0 0 0 0 0 NaN 3.8 6756.6660

50 rows × 29 columns

  • Load and clean JHU data
  • Merge JHU dataset with main dataset
#Load the Johns Hopkins data
jhu_df.tail(50)
LastUpdate ProvinceState Active Confirmed Deaths Recovered
5145 2020-06-19 Alaska 695.0 707.0 12.0 0.0
5146 2020-06-19 Arizona 42162.0 43445.0 1283.0 0.0
5147 2020-06-19 Arkansas 13720.0 13928.0 208.0 0.0
5148 2020-06-19 California 161731.0 167086.0 5355.0 0.0
5149 2020-06-19 Colorado 28248.0 29886.0 1638.0 0.0
5150 2020-06-19 Connecticut 41214.0 45440.0 4226.0 0.0
5151 2020-06-19 Delaware 10068.0 10499.0 431.0 0.0
5152 2020-06-19 District of Columbia 9376.0 9903.0 527.0 0.0
5153 2020-06-19 Florida 82865.0 85926.0 3061.0 0.0
5154 2020-06-19 Georgia 58307.0 60912.0 2605.0 0.0
5155 2020-06-19 Hawaii 745.0 762.0 17.0 0.0
5156 2020-06-19 Idaho 3654.0 3743.0 89.0 0.0
5157 2020-06-19 Illinois 128241.0 134778.0 6537.0 0.0
5158 2020-06-19 Indiana 38947.0 41438.0 2491.0 0.0
5159 2020-06-19 Iowa 24181.0 24861.0 680.0 0.0
5160 2020-06-19 Kansas 11502.0 11753.0 251.0 0.0
5161 2020-06-19 Kentucky 12677.0 13197.0 520.0 0.0
5162 2020-06-19 Louisiana 45572.0 48634.0 3062.0 0.0
5163 2020-06-19 Maine 2776.0 2878.0 102.0 0.0
5164 2020-06-19 Maryland 60213.0 63229.0 3016.0 0.0
5165 2020-06-19 Massachusetts 98653.0 106422.0 7769.0 0.0
5166 2020-06-19 Michigan 60737.0 66798.0 6061.0 0.0
5167 2020-06-19 Minnesota 30299.0 31675.0 1376.0 0.0
5168 2020-06-19 Mississippi 19703.0 20641.0 938.0 0.0
5169 2020-06-19 Missouri 16426.0 17371.0 945.0 0.0
5170 2020-06-19 Montana 635.0 655.0 20.0 0.0
5171 2020-06-19 Nebraska 17175.0 17414.0 239.0 0.0
5172 2020-06-19 Nevada 11694.0 12169.0 475.0 0.0
5173 2020-06-19 New Hampshire 5119.0 5450.0 331.0 0.0
5174 2020-06-19 New Jersey 155238.0 168107.0 12869.0 0.0
5175 2020-06-19 New Mexico 9697.0 10153.0 456.0 0.0
5176 2020-06-19 New York 354786.0 385760.0 30974.0 0.0
5177 2020-06-19 North Carolina 46972.0 48168.0 1196.0 0.0
5178 2020-06-19 North Dakota 3118.0 3193.0 75.0 0.0
5179 2020-06-19 Ohio 40489.0 43122.0 2633.0 0.0
5180 2020-06-19 Oklahoma 8989.0 9355.0 366.0 0.0
5181 2020-06-19 Oregon 6179.0 6366.0 187.0 0.0
5182 2020-06-19 Pennsylvania 78322.0 84683.0 6361.0 0.0
5183 2020-06-19 Rhode Island 15384.0 16269.0 885.0 0.0
5184 2020-06-19 South Carolina 20912.0 21533.0 621.0 0.0
5185 2020-06-19 South Dakota 6031.0 6109.0 78.0 0.0
5186 2020-06-19 Tennessee 32262.0 32770.0 508.0 0.0
5187 2020-06-19 Texas 99130.0 101259.0 2129.0 0.0
5188 2020-06-19 Utah 15687.0 15839.0 152.0 0.0
5189 2020-06-19 Vermont 1079.0 1135.0 56.0 0.0
5190 2020-06-19 Virginia 54652.0 56238.0 1586.0 0.0
5191 2020-06-19 Washington 25947.0 27192.0 1245.0 0.0
5192 2020-06-19 West Virginia 2330.0 2418.0 88.0 0.0
5193 2020-06-19 Wisconsin 23157.0 23876.0 719.0 0.0
5194 2020-06-19 Wyoming 1126.0 1144.0 18.0 0.0
#Grab all historical data and ensure we have the 1st US case.
all_cases.tail()
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
5978 2020-01-26 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5979 2020-01-25 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5980 2020-01-24 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5981 2020-01-23 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615
5982 2020-01-22 Washington WA 7797095 2.0 2.0 NaN NaN NaN NaN ... NaN NaN 0 0 0 0 0 NaN 1.7 13255.0615

5 rows × 29 columns

An Exploratory data analysis of the US dataset

Basic triad of the dataset: validating data types and data integrity of each row

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5983 entries, 0 to 5982
Data columns (total 29 columns):
date                      5983 non-null datetime64[ns]
state                     5983 non-null object
abbrev                    5983 non-null object
population                5983 non-null int64
positive                  5983 non-null float64
active                    5983 non-null float64
hospitalizedCurrently     3692 non-null float64
hospitalizedCumulative    3270 non-null float64
inIcuCurrently            1908 non-null float64
onVentilatorCurrently     1697 non-null float64
recovered                 5983 non-null float64
dataQualityGrade          5049 non-null object
lastUpdateEt              5628 non-null object
dateModified              5628 non-null object
checkTimeEt               5628 non-null object
death                     5983 non-null float64
hospitalized              3270 non-null float64
totalTestsViral           1616 non-null float64
positiveTestsViral        545 non-null float64
negativeTestsViral        546 non-null float64
positiveCasesViral        3157 non-null float64
commercialScore           5983 non-null int64
negativeRegularScore      5983 non-null int64
negativeScore             5983 non-null int64
positiveScore             5983 non-null int64
score                     5983 non-null int64
grade                     0 non-null float64
bedsPerThousand           5983 non-null float64
total_beds                5983 non-null float64
dtypes: datetime64[ns](1), float64(16), int64(6), object(6)
memory usage: 1.4+ MB
#We check the data type are correct above and review our combined, cleaned, validated, and merged data set for all 50 states:
covid_df.head(50)
date state abbrev population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
0 2020-06-29 Alaska AK 734002 904.000 365.000 16.000 nan nan 1.000 ... nan nan 0 0 0 0 0 nan 2.200 1614.804
1 2020-06-29 Alabama AL 4908621 37175.000 17380.000 715.000 2725.000 nan nan ... nan 36682.000 0 0 0 0 0 nan 3.100 15216.725
2 2020-06-29 Arkansas AR 3038999 20257.000 5926.000 300.000 1380.000 nan 63.000 ... nan 20257.000 0 0 0 0 0 nan 3.200 9724.797
3 2020-06-29 Arizona AZ 7378494 74533.000 63766.000 2721.000 4634.000 679.000 465.000 ... nan 74119.000 0 0 0 0 0 nan 1.900 14019.139
4 2020-06-29 California CA 39937489 216550.000 210614.000 6179.000 nan 1686.000 nan ... nan 216550.000 0 0 0 0 0 nan 1.800 71887.480
5 2020-06-29 Colorado CO 5845526 32307.000 26172.000 234.000 5401.000 nan nan ... nan 29476.000 0 0 0 0 0 nan 1.900 11106.499
6 2020-06-29 Connecticut CT 3563077 46362.000 33989.000 99.000 10268.000 nan nan ... nan 44384.000 0 0 0 0 0 nan 2.000 7126.154
7 2020-06-29 District of Columbia DC 720687 10292.000 8541.000 121.000 nan 34.000 23.000 ... nan 10292.000 0 0 0 0 0 nan 4.400 3171.023
8 2020-06-29 Delaware DE 982895 11376.000 4204.000 72.000 nan 15.000 nan ... nan 10306.000 0 0 0 0 0 nan 2.200 2162.369
9 2020-06-29 Florida FL 21992985 146341.000 142795.000 nan 14651.000 nan nan ... 2106109.000 146341.000 0 0 0 0 0 nan 2.600 57181.761
10 2020-06-29 Georgia GA 10736059 79417.000 76633.000 1359.000 10824.000 nan nan ... 750073.000 79417.000 0 0 0 0 0 nan 2.400 25766.542
11 2020-06-29 Hawaii HI 1412687 899.000 162.000 nan 111.000 nan nan ... 88967.000 899.000 0 0 0 0 0 nan 1.900 2684.105
12 2020-06-29 Iowa IA 3179849 28782.000 10223.000 119.000 nan 35.000 18.000 ... nan 28782.000 0 0 0 0 0 nan 3.000 9539.547
13 2020-06-29 Idaho ID 1826156 5319.000 1330.000 nan 312.000 nan nan ... nan 4790.000 0 0 0 0 0 nan 1.900 3469.696
14 2020-06-29 Illinois IL 12659682 143514.000 136411.000 1501.000 nan 372.000 187.000 ... nan 142461.000 0 0 0 0 0 nan 2.500 31649.205
15 2020-06-29 Indiana IN 6745354 45228.000 8256.000 626.000 7024.000 265.000 85.000 ... nan 45228.000 0 0 0 0 0 nan 2.700 18212.456
16 2020-06-29 Kansas KS 2910357 14443.000 13379.000 nan 1152.000 nan nan ... nan 14443.000 0 0 0 0 0 nan 3.300 9604.178
17 2020-06-29 Kentucky KY 4499692 15347.000 10848.000 381.000 2602.000 69.000 nan ... nan 14835.000 0 0 0 0 0 nan 3.200 14399.014
18 2020-06-29 Louisiana LA 4645184 57081.000 11657.000 737.000 nan nan 79.000 ... nan 57081.000 0 0 0 0 0 nan 3.300 15329.107
19 2020-06-29 Massachusetts MA 6976597 108768.000 100673.000 762.000 11345.000 138.000 79.000 ... nan 103628.000 0 0 0 0 0 nan 2.300 16046.173
20 2020-06-29 Maryland MD 6083116 67254.000 59100.000 447.000 10822.000 160.000 nan ... nan 67254.000 0 0 0 0 0 nan 1.900 11557.920
21 2020-06-29 Maine ME 1345790 3219.000 491.000 31.000 347.000 8.000 4.000 ... 89123.000 2863.000 0 0 0 0 0 nan 2.500 3364.475
22 2020-06-29 Michigan MI 10045029 70223.000 12963.000 557.000 nan 193.000 106.000 ... 959152.000 63497.000 0 0 0 0 0 nan 2.500 25112.572
23 2020-06-29 Minnesota MN 5700671 35861.000 3166.000 278.000 4031.000 140.000 nan ... nan 35861.000 0 0 0 0 0 nan 2.500 14251.678
24 2020-06-29 Missouri MO 6169270 21043.000 20045.000 599.000 nan nan nan ... 399926.000 21043.000 0 0 0 0 0 nan 3.100 19124.737
25 2020-06-29 Mississippi MS 2989260 26567.000 6120.000 719.000 3115.000 158.000 93.000 ... nan 26400.000 0 0 0 0 0 nan 4.000 11957.040
26 2020-06-29 Montana MT 1086759 919.000 288.000 13.000 100.000 nan nan ... nan 919.000 0 0 0 0 0 nan 3.300 3586.305
27 2020-06-29 North Carolina NC 10611862 63484.000 16621.000 843.000 nan nan nan ... nan 63484.000 0 0 0 0 0 nan 2.100 22284.910
28 2020-06-29 North Dakota ND 761723 3539.000 288.000 25.000 227.000 nan nan ... nan 3539.000 0 0 0 0 0 nan 4.300 3275.409
29 2020-06-29 Nebraska NE 1952570 18899.000 5310.000 117.000 1316.000 nan nan ... nan 18899.000 0 0 0 0 0 nan 3.600 7029.252
30 2020-06-29 New Hampshire NH 1371246 5760.000 958.000 34.000 565.000 nan nan ... nan 5760.000 0 0 0 0 0 nan 2.100 2879.617
31 2020-06-29 New Jersey NJ 8936574 171272.000 126117.000 978.000 19847.000 225.000 185.000 ... nan 171272.000 0 0 0 0 0 nan 2.400 21447.778
32 2020-06-29 New Mexico NM 2096640 11809.000 6053.000 114.000 1865.000 nan nan ... nan 11809.000 0 0 0 0 0 nan 1.800 3773.952
33 2020-06-29 Nevada NV 3139658 17894.000 16701.000 555.000 nan 124.000 65.000 ... nan 17894.000 0 0 0 0 0 nan 2.100 6593.282
34 2020-06-29 New York NY 19440469 392930.000 297653.000 853.000 89995.000 216.000 136.000 ... nan 392930.000 0 0 0 0 0 nan 2.700 52489.266
35 2020-06-29 Ohio OH 11747694 51046.000 48228.000 669.000 7746.000 245.000 115.000 ... nan 47524.000 0 0 0 0 0 nan 2.800 32893.543
36 2020-06-29 Oklahoma OK 3954821 13172.000 3200.000 329.000 1489.000 134.000 nan ... 313021.000 13172.000 0 0 0 0 0 nan 2.800 11073.499
37 2020-06-29 Oregon OR 4301089 8485.000 5581.000 151.000 1025.000 44.000 25.000 ... 226648.000 8121.000 0 0 0 0 0 nan 1.600 6881.742
38 2020-06-29 Pennsylvania PA 12820878 85988.000 12304.000 635.000 nan nan 111.000 ... nan 83529.000 0 0 0 0 0 nan 2.900 37180.546
39 2020-06-29 Rhode Island RI 1056161 16764.000 14191.000 73.000 1995.000 15.000 14.000 ... nan 16764.000 0 0 0 0 0 nan 2.100 2217.938
40 2020-06-29 South Carolina SC 5210095 34644.000 20468.000 1032.000 2622.000 nan nan ... 324966.000 34546.000 0 0 0 0 0 nan 2.400 12504.228
41 2020-06-29 South Dakota SD 903027 6716.000 807.000 70.000 657.000 nan nan ... nan 6716.000 0 0 0 0 0 nan 4.800 4334.530
42 2020-06-29 Tennessee TN 6897576 42297.000 14743.000 512.000 2599.000 nan nan ... 727985.000 41949.000 0 0 0 0 0 nan 2.900 20002.970
43 2020-06-29 Texas TX 29472295 153011.000 69273.000 5913.000 nan nan nan ... nan nan 0 0 0 0 0 nan 2.300 67786.278
44 2020-06-29 Utah UT 3282115 21664.000 9291.000 254.000 1417.000 80.000 nan ... nan 21664.000 0 0 0 0 0 nan 1.800 5907.807
45 2020-06-29 Virginia VA 8626207 62189.000 52426.000 796.000 8823.000 225.000 101.000 ... nan 59522.000 0 0 0 0 0 nan 2.100 18115.035
46 2020-06-29 Vermont VT 628061 1208.000 203.000 13.000 nan nan nan ... nan 1208.000 0 0 0 0 0 nan 2.100 1318.928
47 2020-06-29 Washington WA 7797095 31752.000 30442.000 270.000 4275.000 nan 52.000 ... nan 31752.000 0 0 0 0 0 nan 1.700 13255.061
48 2020-06-29 Wisconsin WI 5851754 31033.000 8032.000 237.000 3407.000 90.000 nan ... nan 28058.000 0 0 0 0 0 nan 2.100 12288.683
49 2020-06-29 West Virginia WV 1778070 2870.000 581.000 28.000 nan 5.000 3.000 ... nan 2771.000 0 0 0 0 0 nan 3.800 6756.666

50 rows × 29 columns

The NaN values may indicate that there were no to few Covid-19 patients at these date points. We further analyse the statistical values of the dataset columns to ensure data integrity and accuracy.

#Validte the data with; mean, standard deviation, min/max quartiles:
covid_df.describe()
# TODO rounding up the numbers
population positive active hospitalizedCurrently hospitalizedCumulative inIcuCurrently onVentilatorCurrently recovered death hospitalized ... negativeTestsViral positiveCasesViral commercialScore negativeRegularScore negativeScore positiveScore score grade bedsPerThousand total_beds
count 5983.000 5983.000 5983.000 3692.000 3270.000 1908.000 1697.000 5983.000 5983.000 3270.000 ... 546.000 3157.000 5983.000 5983.000 5983.000 5983.000 5983.000 0.000 5983.000 5983.000
mean 6542567.734 21412.731 18878.480 1019.729 4395.342 438.068 223.071 4553.690 1112.353 4395.342 ... 298954.280 32485.798 0.000 0.000 0.000 0.000 0.000 nan 2.626 15805.896
std 7386968.184 47109.975 42211.359 1920.146 12973.791 689.428 327.295 11174.336 2937.103 12973.791 ... 395379.718 56915.582 0.000 0.000 0.000 0.000 0.000 nan 0.744 16159.530
min 567025.000 0.000 0.000 1.000 0.000 2.000 0.000 0.000 0.000 0.000 ... 17.000 0.000 0.000 0.000 0.000 0.000 0.000 nan 1.600 1318.928
25% 1778070.000 644.000 560.000 119.000 224.250 81.000 35.000 0.000 13.000 224.250 ... 50397.750 5060.000 0.000 0.000 0.000 0.000 0.000 nan 2.100 3773.952
50% 4499692.000 5231.000 4603.000 402.000 985.500 181.000 93.000 236.000 150.000 985.500 ... 144362.500 13938.000 0.000 0.000 0.000 0.000 0.000 nan 2.500 11557.920
75% 7797095.000 21193.500 17615.000 1024.250 3296.000 479.250 246.000 3217.000 799.000 3296.000 ... 372973.250 35861.000 0.000 0.000 0.000 0.000 0.000 nan 3.100 19124.737
max 39937489.000 392930.000 356899.000 18825.000 89995.000 5225.000 2425.000 81335.000 24842.000 89995.000 ... 2106109.000 392930.000 0.000 0.000 0.000 0.000 0.000 nan 4.800 71887.480

8 rows × 22 columns

#final_100k_last_month.head()
#Review the out for per capita measures:
final_100k_last_month.describe()
positive_100k active_100k recovered_100k death_100k hospitalizedCumulative_100k inIcuCurrently_100k onVentilatorCurrently_100k BedsPer100k
count 61.000 61.000 61.000 61.000 61.000 62.000 62.000 62.000
mean 358.759 336.008 170.212 17.931 34.329 113.658 62.620 13440.000
std 65.620 442.921 105.723 7.283 42.821 26.916 13.514 0.000
min 245.203 -2213.482 35.481 4.880 -93.926 70.613 39.353 13440.000
25% 308.315 292.339 107.989 12.184 21.638 94.079 53.461 13440.000
50% 344.558 332.717 147.227 17.253 25.122 111.563 62.120 13440.000
75% 405.031 370.778 211.312 23.811 29.823 126.991 74.683 13440.000
max 544.349 2291.210 626.665 33.917 246.371 167.561 94.521 13440.000

Graphical Exploratory Analysis

Plotting histograms, scatterplots and boxplots to assess the distribution of the entire US dataset.

#Validate all US data:
timeseries_usa_df.tail()
date positive_100k active_100k recovered_100k death_100k hospitalizedCurrently_100k inIcuCurrently_100k onVentilatorCurrently_100k BedsPer100k
155 2020-06-25 33812.912 19730.969 12498.864 1583.079 414.087 67.864 36.962 13440.000
156 2020-06-26 34335.924 20098.997 12643.998 1592.929 404.115 67.051 34.318 13440.000
157 2020-06-27 34829.638 20417.559 12812.241 1599.839 407.257 68.533 35.118 13440.000
158 2020-06-28 35334.565 20809.528 12921.408 1603.630 402.011 65.968 33.930 13440.000
159 2020-06-29 35834.140 20955.190 13269.151 1609.799 409.624 66.395 32.901 13440.000

Analysis of Hospitalizations by State

New York

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Alabama

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Arizona

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Positive Cases in Hospital')
Text(0, 0.5, 'No. Patients')

Arkansas

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

California

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Colorado

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')

Connecticut

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')

Delaware

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Florida

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
# TODO fix legend/axis/plot alltogether
# Timeseries plt
fig, ax = plt.subplots(figsize = (16, 12))
plt.plot(fl.date, fl.positiveTestsViral, linewidth=4.7, color='r')
plt.title('Cummulative Number of Positive Viral Tests in Florida', fontsize=23)
plt.xlabel('Date')
plt.ylabel('No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Infected')

Georgia

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, '% Infection Rate')

Hawaii

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, '% Infected')

Idaho

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')
Text(0, 0.5, 'No. Patients')

Iowa

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Kansas

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Kentucky

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')

Louisiana

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Maine

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Maryland

Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Patients')
Text(0, 0.5, 'No. Killed')

Alabama:

Alabama:

Alabama:

Alabama:

Alabama:

Alabama:

Alabama:

Alabama:

South Carolina:

Texas:

Nevada:

Mississippi:

Utah:

Oklahoma:

Assessing Correlation of Independent Variables

Build model for dependent Variable

  • To be used to predict current hospitalizations
  • Having more complete variables for in ICU currently and on Ventilator Currently will allow us to predict these numbers as well.